[CORE] Make LeafTransformSupport's getPartitions return Seq[Partition] by Zouxxyy · Pull Request #10838 · apache/gluten

Zouxxyy · 2025-10-03T10:40:20Z

What changes are proposed in this pull request?

InputPartition is actually just a DSv2 interface API, Partition is common.
LeafTransformSupport's getPartitions should return Seq[Partition], for example:

FileSourceScanExecTransformer return Seq[FilePartition]
BatchScanExecTransformer return Seq[DataSourceRDDPartition]

Especially after Spark proposed Storage-Partitioned Join, a DataSourceRDDPartition may contain multi InputPartitions. See DataSourceRDD:

class DataSourceRDD(
    sc: SparkContext,
    @transient private val inputPartitions: Seq[Seq[InputPartition]],
    partitionReaderFactory: PartitionReaderFactory,
    columnarReads: Boolean,
    customMetrics: Map[String, SQLMetric])
  extends RDD[InternalRow](sc, Nil)

  override protected def getPartitions: Array[Partition] = {
    inputPartitions.zipWithIndex.map {
      case (inputPartitions, index) => new DataSourceRDDPartition(index, inputPartitions)
    }.toArray
  }
}

Due to this issue, some if-else codes were introduced previously, this PR cleans them up.

How was this patch tested?

github-actions · 2025-10-03T10:40:48Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-03T10:44:16Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-03T11:33:19Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-03T13:23:47Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-03T13:26:54Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-03T14:34:52Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-03T15:26:58Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T05:22:24Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T09:05:26Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T10:54:30Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T13:40:33Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T16:15:28Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T16:34:20Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T16:53:23Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-05T18:04:45Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-06T00:48:38Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-06T02:21:38Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-06T05:56:26Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-07T02:41:45Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-07T05:43:38Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-07T09:37:27Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-07T09:59:01Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-07T13:29:28Z

Run Gluten Clickhouse CI on x86

Zouxxyy · 2025-10-07T15:36:57Z

Run Gluten Clickhouse CI on x86

zhztheplayer · 2025-10-08T13:52:58Z

Thanks.

Should we also refactor GlutenWholeStageColumnarRDD? Does soft-affinity still work correctly after the change?

Zouxxyy · 2025-10-08T14:16:26Z

Should we also refactor GlutenWholeStageColumnarRDD?

For GlutenWholeStageColumnarRDD, I think these modifications below are sufficient. Do you think it needs to be modified in this PR as well? (Besieds. NativeFileScanColumnarRDD doesn't seem to be used. Should I delete it?)

case class FirstZippedPartitionsPartition(
    index: Int,
    inputPartition: Partition,
    inputColumnarRDDPartitions: Seq[Partition] = Seq.empty)
  extends Partition

class GlutenWholeStageColumnarRDD(
    @transient sc: SparkContext,
    @transient private val inputPartitions: Seq[Partition],

Does soft-affinity still work correctly after the change?

do you mean getPreferredLocations? I haven't modified its logic, so it still applies

e.g. in GlutenWholeStageColumnarRDD, It will do the correct cast.

  override def getPreferredLocations(split: Partition): Seq[String] = {
    castNativePartition(split)._1.preferredLocations()
  }

Or just like FileScanRDD in apache spark

  override protected def getPreferredLocations(split: RDDPartition): Seq[String] = {
    split.asInstanceOf[FilePartition].preferredLocations()
  }

github-actions · 2025-10-10T12:21:40Z

Run Gluten Clickhouse CI on x86

zhztheplayer · 2025-10-10T12:36:20Z

For GlutenWholeStageColumnarRDD, I think these modifications below are sufficient. Do you think it needs to be modified in this PR as well?

Yes and I saw you're working on this. Thanks.

Besieds. NativeFileScanColumnarRDD doesn't seem to be used. Should I delete it?

For sure. Thanks.

github-actions · 2025-10-10T12:48:53Z

Run Gluten Clickhouse CI on x86

zhztheplayer · 2025-10-15T11:27:13Z

gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala

-      })
-
-    val allSplitInfos = getSplitInfosFromPartitions(isKeyGroupPartition, leafTransformers)
+    val allSplitInfos = leafTransformers.map(_.getSplitInfos).transpose


Do we need to preserve a similar comment like this? Which will help user understand the .transpose usage here.

github-actions · 2025-10-15T12:06:07Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-15T12:08:54Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-10-16T04:30:20Z

Run Gluten Clickhouse CI on x86

Zouxxyy · 2025-10-16T07:33:19Z

@zhztheplayer I rebased and resolved the conflict, can you have a look, thanks

zhztheplayer

Thank you for the refactor. The code is much clearer now.

github-actions bot added CORE works for Gluten Core VELOX DATA_LAKE labels Oct 3, 2025

github-actions bot added the CLICKHOUSE label Oct 3, 2025

Zouxxyy force-pushed the dev/fix-scan branch from f2ff333 to b96f7d2 Compare October 3, 2025 13:23

Zouxxyy force-pushed the dev/fix-scan branch from b96f7d2 to e33eba8 Compare October 3, 2025 13:26

Zouxxyy force-pushed the dev/fix-scan branch from e33eba8 to 07c845d Compare October 3, 2025 14:34

Zouxxyy force-pushed the dev/fix-scan branch from d35bfb5 to 959120c Compare October 5, 2025 09:04

Zouxxyy force-pushed the dev/fix-scan branch from bcee953 to b481a96 Compare October 5, 2025 13:40

Zouxxyy marked this pull request as draft October 5, 2025 15:24

Zouxxyy force-pushed the dev/fix-scan branch from b481a96 to 0f3e831 Compare October 5, 2025 16:14

Zouxxyy changed the title ~~[CORE] Refactor BasicScanExecTransformer to make it accept Seq[Seq[InputPartition]] as InputPartitions~~ [CORE] Make LeafTransformSupport's getPartitions return Seq[Partition] Oct 5, 2025

Zouxxyy force-pushed the dev/fix-scan branch from eb040e0 to 7b250cb Compare October 5, 2025 18:04

Zouxxyy force-pushed the dev/fix-scan branch from 70881d2 to c7ae3cc Compare October 6, 2025 05:55

Zouxxyy force-pushed the dev/fix-scan branch from 46de671 to 075cc55 Compare October 7, 2025 09:37

Zouxxyy force-pushed the dev/fix-scan branch from 8bad7c0 to bb73467 Compare October 7, 2025 13:29

Zouxxyy marked this pull request as ready for review October 8, 2025 13:35

zhztheplayer reviewed Oct 15, 2025

View reviewed changes

Zouxxyy force-pushed the dev/fix-scan branch from 9cb2f44 to dfc4507 Compare October 15, 2025 12:05

v3

dae6c16

Zouxxyy force-pushed the dev/fix-scan branch from 6ed8e1f to dae6c16 Compare October 16, 2025 04:29

zhztheplayer approved these changes Oct 16, 2025

View reviewed changes

zhztheplayer merged commit 2daa733 into apache:main Oct 16, 2025
57 checks passed

Conversation

Zouxxyy commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 5, 2025

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

Zouxxyy commented Oct 7, 2025

Uh oh!

zhztheplayer commented Oct 8, 2025

Uh oh!

Zouxxyy commented Oct 8, 2025

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

zhztheplayer commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

zhztheplayer Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

Zouxxyy commented Oct 16, 2025

Uh oh!

zhztheplayer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zouxxyy commented Oct 3, 2025 •

edited

Loading

zhztheplayer commented Oct 10, 2025 •

edited

Loading

zhztheplayer left a comment •

edited

Loading